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Abstract —This paper introduces matrix product state (MPS) 
decomposition as a computational tool for extracting features 
of multidimensional data represented by higher-order tensors. 
Regardless of tensor order, MPS extracts its relevant features 
to the so-called core tensor of maximum order three which can 
be used for classification. Mainly based on a successive sequence 
of singular value decompositions (SVD), MPS is quite simple 
to implement without any recursive procedure needed for opti¬ 
mizing local tensors. Thus, it leads to substantial computational 
savings compared to other tensor feature extraction methods 
such as higher-order orthogonal iteration (HOOI) underlying the 
Tucker decomposition (TD). Benchmark results show that MPS 
can reduce significantly the feature space of data while achieving 
better classification performance compared to HOOI. 

Index Terms —Higher-order tensor, tensor feature extraction, 
supervised learning, tensor classification, matrix product state 
(MPS), core tensor, dimensionality reduction, Tucker decompo¬ 
sition (TD). 


I. Introduction 

T HERE is an increasing need to handle large multidi¬ 
mensional datasets that cannot efficiently be analyzed or 
processed using modern day computers. Due to the curse of 
dimensionality it is urgent to investigate mathematical tools 
which can evaluate information beyond the properties of large 
matrices (TJ. The essential goal is to reduce the dimension¬ 
ality of multidimensional data, represented by tensors, with a 
minimal information loss by mapping the original tensor space 
to a lower-dimensional tensor space through tensor-to-tensor 
or tensor-to-vector projection Q]. The most natural option for 
these kinds of projection is to utilize an appropriate tensor 
decomposition 0 to represent the original tensor in terms of 
a combination of possibly lower-order tensors. 

A popular method for tensor decomposition is the Tucker 
decomposition (TD) a, also known as higher-order singular 
value decomposition (HOSVD) when orthogonality constraints 
are imposed 0. As a tensor-to-tensor projection, it is an 
important tool for solving problems related to feature ex¬ 
traction, feature selection and classification of large-scale 
multidimensional datasets in various research fields. Its well- 
known application in computer vision was introduced in ID to 
analyze some ensembles of facial images represented by fifth- 
order tensors. In data mining, the HOSVD was also applied to 
identify handwritten digits J6). In addition, the HOSVD has 
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been applied in neuroscience, pattern analysis, image classifi¬ 
cation and signal processing 0. El, 0- The central concept 
of using the TD is to decompose a large multidimensional 
tensor into a set of common factor matrices and a single core 
tensor which is considered as reduced features of the original 
tensor in spite of its lower dimension 0. In practice, the 
TD is often performed in conjunction with some constraints, 
e.g. nonnegativity, orthogonality, etc., imposed on the common 
factors in order to obtain a better feature core tensor 0. 
However, constraints like orthogonality often leads to an NP- 
hard computational problem ED- Practical application of the 
TD is normally limited to small-order tensors. This is due 
to the fact that the TD core tensor preserves the higher- 
order structure of the original tensor, with its dimensionality 
remaining fairly large in order to capture relevant interactions 
between components of the tensor 0. 

The higher-order orthogonal iteration (HOOI) HD is an 
alternating least squares (ALS) for finding TD approxima¬ 
tion of a tensor. Its application to independent component 
analysis (ICA) and simultaneous matrix diagonalization was 
investigated in fl2l . Another TD-based method is multilinear 
principal component analysis (MPCA) ED, an extension of 
classical principal component analysis (PCA), which is closely 
related to HOOI. The motivation behind MPCA is that gener¬ 
ally PCA takes vectors as inputs for dimensionality reduction, 
hence tensor data would need to be vectorized and this can 
result in large computational and memory requirements, even 
for low order data. 

The matrix product state (MPS) decomposition ini, m, 
ED, ra is a tensor-to-tensor projection that has been pro¬ 
posed and applied to study quantum many-body systems with 
great success, prior to its introduction to the mathematics 
community under the name tensor-train (TT) decomposition 
E2. however, to the best of our knowledge its application to 
machine learning and pattern analysis has not been proposed. 
The MPS decomposition is fundamentally different from the 
TD in terms of its geometric structure as it is made up of local 
component tensors with maximum order three. Consequently, 
applying the MPS decomposition to large higher-order tensors 
can potentially avoid the computational bottleneck of the TD 
and related algorithms. 

Motivated by both the TD and MPS decompositions, we 
propose to use MPS as a dimensionality reduction technique 
that consists of low-order common factors and a low-order 
core tensor. Specifically, MPS decomposes a higher-order 
tensor in such a way that its MPS representation is expressed in 
a mixed-canonical form ED- In this form, a unique core tensor 
can be extracted and is naturally described by an orthogonal 
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space spanned by the common factors. This new approach 
provides a unique and efficient way of feature extraction 
applied to tensor classification problem. Specifically, in the 
tensor classification problem it is applied to firstly extract the 
core tensor and common factors for the training set. Then 
the core tensor of test set is extracted by means of common 
factors. Once the core tensors of both training and test sets 
are acquired, they can be used for classifiers such as K-nearest 
neighbors (KNN) and linear discriminant analysis (LDA). 

When compared to HOOI, MPS is not only simpler to 
implement but also more effective in terms of computational 
savings, feature space and classification success rate (CSR). 
This is due to the fact that MPS can obtain orthogonal common 
factors based on successive SVDs without needing any recur¬ 
sive local optimization procedure. We use supervised learning 
(classification) problems to benchmark MPS and compare its 
performance with HOOI. The datasets include the Columbia 
object image libraries 100 (COIL-100) lfl9l . |20i |, the brain- 
computer imagery (BCI) dataset from Shanghai Jiao Tong 
University EQ, the extended Yale Face Database B (EYFB) 
from the Computer Vision Laboratory of the University of 
California San Diego l22l . Experimental results show that in 
most cases, MPS provides better CSR compared to HOOI. 

The rest of the paper is structured as follows. Section [II] 
introduces mathematical notation and preliminaries used in 
the paper. It then formulates the tensor classification problem 
and describes how to solve it utilizing the TD. Section [HI] 
describes in detail how to apply the concept of MPS to tensor 
classification problem, and subsequently proposes the idea 
of the MPS core tensor and common factors. MPS is then 
described in detail, followed by computational complexity 
analysis. In Section |IV| experimental results are shown to 
compare MPS to HOOI. The conclusions are given in Section 

El 

A preliminary result of this work was presented in |[23l . 
In the present paper, we rigorously introduce MPS algorithm 
with computational complexity analysis. Also, new benchmark 
results are rigorously compared with those obtained by HOOI 
to show the advantages of MPS. In this context, we show that 
MPS can circumvent the problem of unbalanced matricization 
incurred in HOOI. 

II. Tensor classification 

To make the paper self-contained we introduce some nota¬ 
tions and preliminaries of multilinear algebra 0- A tensor is 
a multidimensional array and its order (also known as way or 
mode) is the number of dimensions it contains. Zero-order 
tensors are scalars and denoted by lowercase letters, e.g., 
x. A first-order tensor is a vector and denoted by boldface 
lowercase letters, e.g., x. A matrix is a second order tensor 
and denoted by boldface capital letters, e.g., X. A higher- 
order tensor (tensors of order three and above) are denoted 
by boldface calligraphic letters, e.g., X. Generally, an Mh- 
order tensor is denoted as X £ ftixi 2 x-xt N ^ where each It 
is the dimension of the local subspace i. We also denote as 
the ith entry of a vector x and x, :) as an element of a matrix 
X. Generally, an element of an ATh-order tensor X is denoted 


A mode-n fiber of a tensor X £ j s defined 

by fixing all indices but i n and denoted by Xj 1 ...i n _ i: i n+1 ...i JV . 

Mode-n matricization (also known as mode-n unfolding 
or flattening) of a tensor X £ xi 2 x—xi N j s p ro . 
cess of unfolding or reshaping the tensor into a matrix 
X(„) £ by rearranging the mode-n 

fibers to be the columns of the resulting matrix. Tensor element 
(«t, ■ ■. ,i n -i,in,in+i, ■ ■ •, %n ) maps to matrix element ( i n ,j ) 
such that 

N k -1 

j = 1 + ^2 ~ l ) J k with Jk= n Im- (1) 

The mode-n product of a tensor X £ jjAx^x — x/iv w j t h a 
matrix A £ x/ " results into a new tensor of size 1 1 x • • ■ x 

I n —i x J n x I n+ 1 x • • • x Jjv which is denoted as X x n A. 
Elementwise, it is described by 




-ljnin + l-'-iN 



The inner product of two tensors X ,y £ 
defined as 


■■■iN a jnin ' 


( 2 ) 


jj/ix/ 2 x---x/jv j s 


1 1 A t y 

( X,y) — ^ ' 'y ' ••• y ' ( 3 ) 

*1=1 *2 = 1 *w = l 


Accordingly, the Frobenius norm of X is || X\\p = \J{X. X). 

Having sufficient notations and preliminaries of multilinear 
algebra, we are considering the following tensor classification 
problem: 

Tensor classification problem. Given a set of K train¬ 
ing samples represented by Nth-order tensors X ^ £ 

jj/ix/ 2 x---x/iv — 1 ,2,...,K) corresponding to D cat¬ 
egories, and a set of L test data y 11 ' 1 £ Rkxhx-x/jv 
(i = 1,2classify the test data into the categories 
D with high accuracy. 

The problem is usually addressed by the following steps: 

• Step 1: Apply tensor decomposition method to the train¬ 
ing set find a set of common factors and corresponding 
reduced features of each training sample X 1L1 . 

• Step 2: Extract the reduced features of each test sample 
y <k) in the test set using the common factors in Step 1. 

• Step 3: Perform classification based on the reduced 
features of training and test sets using conventional 
methods m such as K-nearest neighbors (KNN) and 
linear discriminant analysis (LDA). 

For Step 1, the authors in J7] proposed methods based on the 
TD to obtain the common factors and the core tensor from the 
training set. More specifically, the K training sample tensors 
are firstly concatenated along the mode (N + 1) so that the 
training set is represented by an (N + l)th-order tensor X 
defined as 


X = [X^X^ ■ ■ ■ X^ K ^] £ K AX/ 2 X...X/«XA (4) 

TD-based method such as HOOI |QTj is then applied to have 
the approximation 


X « 


as Xj f , . 


5xiA (1) x 2 A (2) -- - x^v A ( N \ 


(5) 
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where each matrix = [a^, a^,...,a^] £ M. IjXA t(j = 
1,2is orthogonal, i.e. A i:i)T A U '> = I (I £ K A ^ xA > 
denotes the identity matrix). It is called by a common factor 
matrix and can be thought of as the principal components in 
each mode j. The parameters Aj satisfying 


A j < rank(X(j)) (6) 

are referred to as the bond dimensions or compression ranks 
of the TD. The (TV + l)th-order tensor 

g g x A 2 x ■■■x A n xif 


is called the core tensor, which contains reduced features of 
the training samples, is represented in the subspace spanned by 
the common factors A^\ More specifically, if Q is matricized 
such that G(jv+i) £ fljAx(AiA 2 —a*,^ eac }j row fc 0 f G 
represents 

N 

= n (?) 

i=i 

number of reduced features of the corresponding sample X <kl 
in the training set. 

The core tensor Q and common factors A'" are found as 
the solution of the following nonlinear least square 

min 11 X — n xjU (1) x 2 u (2) --- x N 

■R.UW) (8) 

subject to (xj(T)) t xj(T) =l,j = 1,TV, 

which is addressed by alternating least square (ALS) in each 
U^-* (with other U^, i ^ j held fixed). The computation 
complexity per one iteration consisting of TV ALS in U (j \ 
j = 1,..., TV is (251 

0(KAI N + NKIA 2 ( n - 1 ') + TVA'A 3(jv - 1} ) (9) 


for 

I 3 = I and Aj = A, j = 1,2,..., TV. (10) 

After computing the core tensor and the common factors 
for the training set, we proceed to Step 2 to extract the core 
tensor containing reduced features for the test set. Specifically, 
the core tensor for the test set is given by 

e = yx 1 (A (1 )) r -x ff (AW) T , (id 


where the test set is defined as 


3; = [j/ 1 );}/ 2 ) • • • £ RJixJ 2 x-xJ w x£ (i2) 


and Q £ 


ilXA2X--XAjvXL 


. Again, the test core tensor Q 


can be matricized such that Q ( 


{(AT+ 1 ) fc ^ x(AiA2 '" An) , each 
row l of Q represents Nf = ]~[ (=1 Aj number of reduced 
features of the corresponding sample y <k> in the test set. 
Finally, G(jv+i) and Q(jv+t) can be used for performing 
classification according to Step 3. 

Although the core tensors Q and Q can be used for direct 
training and classification in Step 3, their dimensionality often 
remain large. This is due to the fact that they retain the same 
orders of their own original tensors. Thus, it may require a 
further dimensionality reduction of the core tensors using tech¬ 
niques such as Fisher score ranking before inputting them into 


classifiers to improve the classification accuracy 1(7). Besides, 
the computational complexity of HOOI to obtain ([5]) is high as 
<0 for the computational complexity per one iteration shows, 
which may become prohibitive when the order of the training 
tensor is large. In addition, due to the unbalanced single-mode 
matricization (one mode versus the rest) of the tensor when 
using HOOI, it may not be capable of capturing the mutual 
correlation between modes of the tensor which usually needs 
multimode matricization (a few modes versus the rest). Hence, 
it might cause loss of important information for classification 
while decomposing the tensor. To circumvent these issues, we 
propose to use the MPS decomposition in the next section. 


III. Matrix Product State Decomposition for 
Tensor Feature Extraction 

In this section we develop a tensor feature extraction method 
based on MPS decomposition as an alternative solution to the 


above stated tensor classification problem. Subsection III-A 
presents the concept of common factors and a core tensor 
for the MPS decomposition of tensor feature extraction and 
classification problems. Then the MPS is proposed in Subsec¬ 
tion III-B We finally analyse the computational complexity of 


MPS in comparison with HOOI in Subsection |III-C 


A. Common factors and core tensor of matrix product state 
decomposition 

We now introduce the concept of core tensor and common 
factors of an MPS representing the training tensor X in Q. 
Without loss of generality, let us opt to permute the mode K 
of the tensor X such that 

X £ M^ 1 x—J n _!x KxI„ -xI N 

(the mode K is now located at n which can be chosen 
arbitrarily but conventionally we choose it at the middle of 
the chain, say n = round(TV/2)). Accordingly, positions of 
modes /„ ,.... / y are shifted by one to the right. Then, we 
present the elements of X in the following mixed-canonical 
form QD of the matrix product state (MPS) or tensor train 
(TT) decomposition lfl6l . Ifl4l , |[T5l . ifTTI : 

„ _rW Tj( n— l)p ( ra ) f'( n +l) p(A , +l) flTI 

•Til •••*;...JJV — -Djj y ' tN > 

where B A) and i (the upper index “(j)” denotes the 
position j of the matrix in the chain) of dimension Arj_ij x Aj 
(Ao = Atv + i = 1), are called “left” and “right” common 
factors which satisfy the following orthogonality conditions: 

V = I, (j = 1,... ,n — 1) (14) 

and 

E c ?J_ 1) ( c g_ 1 )) T = 1 (j = n + 1,... ,N + 1 $15) 

Hi- 1) 

respectively, where I denotes the identity matrix. Each G[ n> 
for k = 1,2,..., K is a matrix of dimension A„_i x A n and 
the MPS core tensor is defined by 

gin) = [ G W qW . . . Q^)j g R A„_ lX A„xA' 
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which describes the reduced features of the training set. The 
parameters A j are called the bond dimensions or compression 
ranks of the MPS. 


Using the common factors IT'' and C\ J> , we can extract 

l 3 Hi-!) 

the core tensor for the test tensor y. Specifically, we first need 


-U) 


to permute y defined in Eq. (12 1 in such a way that the index 
£ is at the same position as k in the training tensor to ensure 
the compatibility between the training and test tensors, i.e., 
the permuted 

G X " In— 1 X Lxln"- X/jv 

Then the core tensor Q I ' H ' > of the test tensor y is given by 
(n) __ [/-)(") (\i n ) . . . r*( n )l e iraA„-ixA„xL 


Q (n > = [Q, Q 


where 


o {n) 

^(0 






(Cir 1 ')". (16) 

Note that as the core tensors and Q^ n - of both training 
and test tensors are extracted by using the same common 
factors, they are represented by the same base. Thus, they 
can be used for the classification directly for Step 3. More 
precisely, we can matricize Q' n> and Q' ri) to 


G (n) 


and 


Q 


vK X (A n _iA n ) 


? Lx(A n _iA n ) 


such that each of their rows, either G^ or Q <n ' 1 is a sample 
containing 

N f = A n _ 1 A n (17) 

number of reduced features of the original sample in either 
training or test set, respectively. 


In the next section we will show that Eq. (13 1 can be 
implemented straightforwardly without any recursive local 
optimization procedure like ALS in HOOI required. This 
results into substantial computational savings. Thus, the MPS 
can overcome the aforementioned issues of HOOI. 


B. Matrix product state for feature extraction 

We describe the MPS method for computing the core tensor 
and common factors of the training set. More specifically, 
we show how to decompose the training tensor X into the 
MPS according to Eq. To this end, we apply two 

successive sequences of SVDs to the tensor which include 
left-to-right sweep for computing the left common factors 
and right-to-left sweep for computing the 

( n+1 ' > q( n + 1 'I anc j CQre tensor 

i N 


B 


(i) 


,B 


(n-l) 


right common factors C.- 


Q' n> in Eq. (13 i explained in the following!]!]]: 

• Left-to-right sweep for left factor computation: 


U) 


The left-to-right sweep involves acquiring matrices B.) 
(ij = 1..... /j , where j = 1,..., n — 1) fulfilling orthog¬ 
onality condition in Eq. Let us start by performing the 
mode-1 matricization of X to obtain the matrix 


Then applying the SVD to W such that 

w = USV T . 

We then define the first common factors B,^ = U, £ M lxAl , 
where Ai < raukl'X, | j), satisfying the left-canonical con¬ 
straint in Eq. (fl4] due to the SVD. In order to find the next 
1 -* (2) 

common factors B ? - 2 we firstly form the matrix 

= sy T g R Aix(/ 2 -if-/„)_ 

The matrix W is then reshaped to 

and its SVD is given by 

w = USV T . 


Reshape the matrix 

U £ r( a i / 2 )x a 2 (A 2 < rank(W)) 

into a third-order tensor 

U e r a iX / 2 xA 2 

( 2 ) 

and we define B," = U, 2 satisfying the left-canonical 

constraint due to the SVD. Applying a same procedure for 

(3) 

determining B) 3 by forming a new matrix 

w = SV T £ rA 2 x(i 3 -k-i n ), 


reshaping it to 


performing SVD and so on. This procedure is iterated until 
we obtain all the matrices B - (ij = 1..... / y , where j = 
1,..., n— 1) fulfilling the left-canonical constraint in Eq. 13- 
In a nutshell, after completing the left-to-right sweep ele¬ 
ments of tensor X are written in the following form: 


ki n 


■lN+1 


r( 1 ) r* ( n_ 1 )\x/ 

•••%_ i W( kin 


( 18 ) 


where the matrix W is reshaped to the matrix form W £ 
g(A„_ 1 if-/ J ,_i)xJ N f or t j le next r ight-to-left sweeping pro¬ 
cess. 

• Right-to-left sweep for right factor computation: 

Similar to left-to-right sweep, we perform a sequence of 
SVDs starting from the right to the left of the MPS to get 
the matrices c| j) _ (i(j- 1 ) = 1, ■ • ■ ,I(j- 1 ), where j = n + 

1 ,... ,7V + 1) fulfilling the right-canonical condition in Eq. 

To start, we apply the SVD to the matrix W obtained 
previously in the left-to-right sweep such that 

w = USV T . 


Let us then define 

p(JV+l) yT c rnAjvXl 

'-i JV v in “■ ’ 

where A ..y < rank(W), which satisfies the right-canonical 
constraint (Eq. ( fl5] l) due to the SVD. Next, multiply U and S 
together and reshape the resulting matrix into 


W £ R / ix(/ 2 "-Jf---/jv)_ 


W £ R(A„_i/f---Jjv_2)x(/jv-iAjv) 
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Again, applying the SVD to the matrix W, we have 


w = USV T . 


Reshape the matrix 

V T G ]]J A «--ix(/iv_iA N ) 

where Ajv-r < rank(W), into a third-order tensor 

V G ]J| An -i xUv_i x A^ 

and we define the next common factor C' N ’ = V,. „ ,, 
satisfying Eq. ( |15] >. We needs to obtain the matrix W by 
multiplying U and S together for determining the next common 
factor, i.e. J . This procedure is iterated until all the 

common factors Q 7 (i(j-i) = 1, ■ ■ •, where j = 

n +1,..., TV +1) are acquired. In the end, we obtain Eq. © 
for MPS decomposition of the tensor X where the core tensor 

g( n ) g jg,A„_ixA„xtf 


TABLE I: Matrix product state for tensor feature extraction 


Input: 

Output: 


X £ R J 1 V " ' X 1 " - 1 N K " ' X 1 N 
e: SVD threshold 

gO) g|A„_ 1 x4„xi( ! 


> U) 
V 

->U) 


(ij = 1,... ,Ij,j = 1,... ,n - 1) 

!■ 1*0-1) = = n + 1, ■ • •, N + 1) 


Set W = X (1) 
for j — 1 to n — 1 
w = USV T 
W : 


: USV T 


Bp ) = U; 


W = SV 


% Mode-1 matricization of X 
% Left-to-right sweep 
% SVD of W 

% Thresholding S using Eq. |2o] 
% Set common factors 
% Construct new matrix W 


end 


8: Reshape W S 

9: for j = N + 1 down to n + 1 % right-to-left sweep 

10: W = USV T % SVD of W 

11: Ws:USV T % Thresholding S using Eq. |2oJ 

12: C (j> = V, r % Set common factors 

*w-y_ 

13: W = US % Construct new matrix W 

14: end 

15: Set C/( ra ) = VV % Training core tensor 
Texts after symbol are comments. 


is determined by reshaping the matrix 

G (n) = US G 

Having done this, we can substitute the common factors into 
Eq. @ to extract the core tensor for the test tensor. 

Note that the MPS decomposition described by Eq. 
can be performed exactly or approximately depending on the 
bond dimensions A j (j = 1,, N ) which have the following 
bound fT7) : 


A j < rank(W) < rank(Xy]), (19) 


versus their counterpart (|6ji in HOOI, where rank(Xy]) de¬ 
notes the rank of the matrix Xm of size (/| A ■ ■ ■ l :l ) x 
(I j+1 ■■■K---I n ) which is the inode-{ 1,2,..., j) matriciza¬ 
tion of the tensor X. In practice, each bond dimension Ay is 
usually truncated to be smaller than rank(W) on every SVD of 
W leading to an efficient MPS decomposition. To this end, we 
rely on thresholding the singular values of W. For instance, 
applying SVD to the matrix W G M. lxj (let us assume I <J), 
we have W = USV T , where S = diag{s\ 1 s 2 , ■ ■ •, sj) are the 
nonvanishing singular values. With a threshold e being defined 
in advance, we truncate the bond dimension by keeping only 
A singular values such that 


Ejtr*; 

ELi-j 


> e. 


( 20 ) 


Having done this, we have W « USV , where U G R /xA , 
S G R AxA and V T G M AxJ . Note that the larger the e 
the more accurate MPS decomposition of the tensor X but 
less efficient in reducing the dimensionality of the tensor. 
Therefore, one needs to choose an appropriate value for 
e via empirical simulations. A summary of applying MPS 
decomposition for tensor feature extraction can be found in 
Table □ 


C. Complexity analysis 

For a given training tensor X G 


L xI 2 x---xInxK unc | er 


the assumption (10 1 , the TD and MPS representation of X 
consists of 

NIA + KA n 


and 


(TV - 2)1 A 2 + KA 2 + 21A 


parameters, respectively. The dominant computational com¬ 
plexity of MPS is C?(iT/( Ar+1 )) due to the first SVD of the 
matrix obtained from mode-1 matricization of X. On the 
other hand, the computational complexity of HOOI is 0 
per iterations with unknown iteration number to attained the 
convergence of ALS rounds. In addition, it usually employs 
the HOSVD to initialize the tensors which involves the cost 
of order 0(NKI n+1 ), and thus very expensive with large TV 
compared to MPS. 


IV. Experimental results 


In this section, we apply MPS to perform feature extraction 
for classification problem. The method is applied to a few 
datasets and compared with one of the most popular feature 
extraction methods, i.e. HOOI. Specifically, three datasets, 
namely Columbia Object Image Libraries (COIL)-100 03, 
ll20l . Extended Yale Face Database B (EYFB) 1221 and BCI 
Jiaotong dataset (BCIJ) ED, are used to benchmark the 
simulation. In all simulations, we rely on the threshold e 


defined in Eq. (20 1 to adjust the bond dimensions of MPS 


in Eq. ( fl3j ) as well as that of HOOI in Eq. <[5]). 

The COIL-100 dataset has 7200 color images of 100 objects 
(72 images per object) with different reflectance and complex 
geometric characteristics. Each image is initially a 3rd-order 
tensor of dimension 128 x 128 x 3 and then is downsampled 
to the one of dimension 32 x 32 x 3. The dataset is divided 
into training and test sets randomly consisting of K and L 
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images, respectively according to a certain holdout (H/O) ratio 
r, i.e. r = jr. Hence, the training and test sets are represented 
by four-order tensors of dimensions 32 x 32 x 3 x K and 
32 x 32 x 3 x I, respectively. In Fig. [I] and Fig. [2] we show 
how a few objects of the training and test sets (r = 0.5 is 
chosen), respectively, change after applying MPS and HOOI 
to reduce the number of features with two different values of 
threshold, e = 0.9,0.65. In both training and test sets, we can 
see that with e = 0.9, the images are not modified significantly 
due to the fact that many features are preserved. However, in 
the case that e = 0.65, the images are blurred. That is because 
less features are kept. However, we can observe that the shapes 
of objects are still preserved. Especially, in most cases MPS 
seems to preserve the color of the images better than HOOI. 
This is because the bond dimension corresponding to the color 
mode I 3 = 3 has a small value, e.g. A 3 = 1 for e = 0.65 
in HOOI. This problem arises due to the the unbalanced 
matricization of the tensor corresponding to the color mode. 
Specifically, if we take a mode-3 matricization of tensor 
X € R 32x32x3xi % the resulting matrix of size 3 x (10247T) is 
extremely unbalanced. Therefore, when taking SVD with some 
small threshold e, the information corresponding to this color 
mode may be lost due to dimension reduction. On the contrary, 
we can efficiently avoid this problem in MPS by permuting 
the tensor such that X G jg>32x.R'x3x32 before applying the 
tensor decomposition. 



) Original samples (size 32 x 32 x 3) 




(b) Samples with MPS, e = 0.9 (core size 18 x 24) 



(c) Samples with HOOI, e = 0.9 (core size 18 x 16 x 2) 



(d) Samples with MPS, e = 0.65 (core size 6x3) 



(e) Samples with HOOI, e = 0.65 (core size 5x4x1) 



Fig. 1: Modification of a 10 objects in the training set of 
COIL-100 are shown after applying MPS and HOOI corre¬ 
sponding to e = 0.9 and 0.65 to reduce the number of features 
of each object. 


To validate MPS for classification, the core tensors with 
full sizes obtained from both methods are input directly to a 
classifier which is chosen as the K nearest neighbor with K= 1 
(KNN-1) in our case. For each H/O ratio, the classification 
success rate (CSR) is averaged over 10 iterations of randomly 
splitting the dataset into training and test sets. Comparison 
of performance between MPS and HOOI is shown in Fig. [3] 
for four different H/O ratios, i.e. r = (50%, 80%, 90%, 95%). 
In each plot, we show the CSR with respect to threshold e. 
We can see that MPS performs quite well when compared 


(a) Original samples (size 32 x 32 x 3) 



(b) Samples with MPS, e = 0.9 (core size 18 x 24) 
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(c) Samples with HOOI, e = 0.9 (core size 18 x 16 x 2) 



(e) Samples with HOOI, e = 0.65 (core size 5x4x1) 
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Fig. 2: Modification of 10 objects in the test set of COIL-100 
are shown after applying MPS and HOOI corresponding to 
e = 0.9 and 0.65 to reduce the number of features of each 
object. 


to HOOI. Especially, with small e, MPS performs much 
better than HOOI. Besides, we also show the best CSR 
corresponding to each H/O ratio obtained by different methods 
in Table. [II| It can be seen that MPS always gives better results 
than HOOI even in the case of small value of e and number 
of features Nf defined by (|7j) and (17 1 for HOOI and MPS, 
respectively. 
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Fig. 3: Error bar plots of CSR versus thresholding rate e for 
different H/O ratios. 


We also perform experiment on the EYFB dataset which 
contains 16128 grayscale images with 28 human subjects, 
under 9 poses, where for each pose there is 64 illumination 
conditions. Similar to (26], to improve computational time 
each image was cropped to keep only the center area con¬ 
taining the face, then resized to 73 x 55. The training and test 
datasets are not selected randomly but partitioned according 
to poses. More precisely, the training and test datasets are 
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TABLE II: COIL-100 benchmark: The best CSR corresponding to different H/O ratios obtained by MPS and HOOI. 


Algorithm 

CSR 

N f 

e 

CSR 

N f 

e 

r = 50% 

HOOI 

98.87 ± 0.19 

198 

0.80 

r = 85% 

94.13 ±0.42 

112 

0.75 

MPS 

99.19 ±0.19 

120 

0.80 

95.37 ± 0.31 

18 

0.65 

r = 90% 

HOOI 

87.22 ±0.56 

112 

0.75 

r = 95% 

77.76 ±0.90 

112 

0.75 

MPS 

89.38 ±0.40 

59 ±5 

0.75 

83.17 ± 1.07 

18 

0.65 


selected to contain poses 0, 2, 4, 6 and 8 and 1, 3, 5, and 7, 
respectively. For a single subject the training tensor has size 
5 x 73 x 55 x 64 and 4 x 73 x 55 x 64 is the size of the test 
tensor. Hence for all 28 subjects we have fourth-order tensors 
of sizes 140 x 73 x 55 x 64 and 112 x 73 x 55 x 64 for the 
training and test datasets, respectively. 

We apply the MPS and HOOI to extract the core tensors 
before inputting them into the classifiers. However, in this 
experiments we realize that the size of each core tensor 
remains very large even with small threshold used, e.g., for 
e = 0.75, the core size of each sample obtained by MPS 
and HOOI are 18 x 201 = 3618 and 14 x 15 x 13 = 2730, 
respectively which is not useful for directly classifying. There¬ 
fore, we need to further reduce the sizes of core tensors 
before feeding them to classifiers for a better performance. 
In our experiment, we simply apply a further truncation to 
each core tensor by keeping the first few dimensions of each 
mode of the tensor. Intuitively, this can be done as we have 
already known that the space of each mode is orthogonal and 
ordered in such a way that the first dimension corresponds 
to the largest singular value, the second one corresponds to 
the second largest singular value and so on. Therefore, we 
can independently truncate the dimension of each mode to a 
reasonably small value (which can be determined empirically) 
without changing significantly the meaning of the core tensor. 
It then gives rise to a core tensor of smaller size that can 
be used directly for classification. More specifically, suppose 
that the core tensors obtained by MPS and HOOI have sizes 
Q x Ai x A 2 and Q x Ai x A 2 x A 3 , where Q is the 
number K (L ) of training (test) samples, respectively. The 
core tensors are then truncated to be Q x A x A 2 and 
QxAixA 2 xA 3 , respectively such that A i < A i (l = 1, 2, 3). 
Note that each A; is chosen to be the same for both training 
and test core tensors. We show the classification results for 
different threshold values e in Table. m using two different 
classifiers, i.e. KNN-1 and LDA. In this result, the core tensors 
obtained by MPS and HOOI are reduced to have sizes of 
Q x Ai x A 2 and Q x A 3 x A 2 x A 3 , respectively such 
that A : = A 2 = A e (10,11,12,13,14) and A 3 = 1 . As 
a result, the reduced core tensors obtained by both methods 
have the same size for classification. Note that each value of 
CSR in Table. Ill is computed by taking the average of the 
ones obtained from classifying different reduced core tensors 
due to different A. We can see that the MPS gives rise to 
better results for all threshold values using different classifiers. 
More importantly, MPS with smallest e can also produce CSR 
as well as largest e. The LDA classifier gives rise to the best 
result, i.e. 97.32 ± 0 . 89 . 

Lastly, we study the BCIJ dataset which consists of single 


trial recognition for BCI electroencephalogram (EEG) data 
involving left/right motor imagery (MI) movements. The orig¬ 
inal experiment had five subjects and the paradigm required 
subjects to control a cursor by imagining the movements of 
their right or left hand for 2 seconds with a 4 second break 
between trials. Subjects were required to sit and relax on a 
chair, looking at a computer monitor approximately lm from 
the subject at eye level. For each subject, data was collected 
over two sessions with a 15 minute break in between. The 
first session contained 60 trials (30 trials for left, 30 trials 
for right) and were used for training. The second session 
consisted of 140 trials (70 trials for left, 70 trials for right). The 
EEG signals were sampled at 500Hz and preprocessed with 
a filter at 8-30Hz, hence for each subject the data consisted 
of a multidimensional tensor channel x time x trial. Prior 
to simulation we preprocessed the data by transforming the 
tensor into the time-frequency domain using complex Mortlet 
wavelets with bandwidth parameter fo = 6 Hz (CMOR6-1) 
to make classification easier Gil, El. The wavelet center 
frequency f c = 1Hz is chosen. Hence, the size of the 
concatenated tensors are 62 channels x 23 frequency bins x 
50 time frames x Q. 

We perform the experiment for subject 1 and 2 of the 
dataset. Similar to the case of the EYFB dataset, after applying 
the feature extraction methods, the core tensors still have high 
dimension, so we need to further reduce their sizes before 
using them for classification. For instance, the reduced core 
sizes of MPS and HOOI are chosen to be Q x 12 x A and 
Q x 12 x A x 1, where A S ( 8 ,..., 14), respectively. For 
classification, we use LDA and the classification results are 
shown in Table. [IV]for different threshold values. We can see 
that MPS always performs better than HOOI. 

V. Conclusion 

In this paper, we propose MPS as an alternative to TD-based 
algorithms for feature extraction applied to tensor classifica¬ 
tion problem. Compared to HOOI, MPS has been shown to 
have some advantages such as computational savings due to 
successive SVDs are employed and no recursive optimization 
needed for acquiring common factors and core tensors. In 
addition, using extracted features given by core tensors for 
classifiers is capable of leading to better classification success 
rate even though a same number of features is used from both 
MPS and HOOI. We have validated our method by applying 
it to classify a few multidimensional datasets, such as visual 
data (COIL-100 and EYFB) and EEG signals where training 
and test data represented by fourth-order tensors. Benchmark 
results show that MPS gives better classification rates than 
HOOI in most cases. 










TABLE III: EYFB benchmark with reduced core tensors being used for classification, core sizes of MPS and HOOI are 
Q x A x A and Q x A x A x 1 , where A € (10,..., 14), respectively. 


Algorithm 

CSR (e = 0.9) 

CSR (e = 0.85) 

CSR (e = 0.80) 

CSR (e = 0.75) 

KNN-l 

HOOI 

90.71 ± 1.49 

90.89 ± 1.60 

91.61 ± 1.26 

88.57 ± 0.80 

MPS 

94.29 ± 0.49 

94.29 ± 0.49 

94.29 ±0.49 

94.29 ±0.49 

LDA 

HOOI 

96.07 ±0.80 

95.89 ± 0.49 

96.07 ± 0.49 

96.07 ± 0.49 

MPS 

97.32 ± 0.89 

97.32 ± 0.89 

97.32 ± 0.89 

97.32 ± 0.89 


TABLE IV: BCI Jiaotong benchmark with reduced core tensors being used for classification, core sizes of MPS and HOOI 
are Q x 12 x A and Q x 12 x A x 1, where As (8 ,..., 14), respectively. 


Algorithm 

CSR (e = 0.9) 

CSR (e = 0.85) 

CSR (e = 0.80) 

CSR (e = 0.75) 

Subject 1 
HOOI 

84.39 ±1.12 

83.37 ± 0.99 

82.04 ± 1.05 

84.80 ± 2.21 

MPS 

87.24 ± 1.20 

87.55 ± 1.48 

87.24 ± 1.39 

87.65 ± 1.58 

Subject 2 
HOOI 

83.16 ±1.74 

82.35 ± 1.92 

82.55 ± 1.93 

79.39 ± 1.62 

MPS 

90.10 ± 1.12 

90.10 ± 1.12 

90.00 ± 1.09 

91.02 ± 0.70 


For the future outlook, we plan to further improve MPS 
for classifying very big multilinear datasets. We also plan to 
extend this promising tool for many other problems such as 
multilinear data compression and completion. 
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